1 INTRODUCTION:

1.1 What is Peer-to-Peer Lending?

See photo:
image Peer-to-peer (P2P) was a phenomenon less than ten years ago, exploding in popularity by offering a break from traditional banking. Individuals flocked to the alternative credit markets as alternative sources of funding and for new opportunities to finance their small business ventures. Peer-to-peer (P2P) lending enables individuals to obtain loans directly from other individuals, cutting out the financial institution as the middleman. P2P lending is also known as “social lending” or “crowd lending.” It has only existed since 2005, but the crowd of competitors already includes Prosper, LendingClub, Upstart, and StreetShares. P2P lending websites connect borrowers directly to investors. The site sets the rates and terms and enables the transactions. P2P lenders are individual investors who want to get a better return on their cash savings than they would get from a bank savings account or certificate of deposit. P2P borrowers seek an alternative to traditional banks or a lower interest rate. The default rates for P2P loans are much higher than those in traditional finance. P2P lending websites connect borrowers directly to lenders. Each website sets the rates and the terms and enables the transaction. Most sites have a wide range of interest rates based on the creditworthiness of the applicant. First, an investor opens an account with the site and deposits a sum of money to be dispersed in loans. The loan applicant posts a financial profile that is assigned a risk category that determines the interest rate the applicant will pay. The loan applicant can review offers and accept one. (Some applicants break up their requests into chunks and accept multiple offers.) The money transfer and the monthly payments are handled through the platform. The process can be entirely automated, or lenders and borrowers can choose to haggle. Some sites specialize in particular types of borrowers. StreetShares, for example, is designed for small businesses. And LendingClub has a “Patient Solutions” category that links doctors who offer financing programs with prospective patients. Peer-to-peer lending is riskier than a savings account or certificate of deposit, but the interest rates are often much higher. This is because people who invest in a peer-to-peer lending site assume most of the risk, which is normally assumed by banks or other financial institutions. Although direct P2P lending has undergone changes over recent years, it remains a viable option for borrowers and investors. The global peer-to-peer lending market was worth 83.79 billion USD in 2021, according to figures from Precedence Research. This figure is projected to reach $705.81 billion by 2031. The simplest way to invest in peer-to-peer lending is to make an account on a P2P lending site and begin lending money to borrowers. These sites typically let the lender choose the profile of their borrowers, so they can choose between high risk/high returns or more modest returns. Alternatively, many P2P lending sites are public companies, so one can also invest in them by buying their stock. See photo:
image

1.2 Introducing the LendingClub:

See photo:
image

LendingClub is a financial services company headquartered in San Francisco, California. It was the first peer-to-peer lender to register it LendingClub enabled borrowers to create unsecured personal loans between 1,000 USD and 40,000 USD. The standard loan period was three years. Investors were able to search and browse the loan listings on LendingClub website and select loans that they wanted to invest in based on the information supplied about the borrower, amount of loan, loan grade, and loan purpose. Investors made money from the interest on these loans. LendingClub made money by charging borrowers an origination fee and investors a service fees offerings as securities with the Securities and Exchange Commission, and to offer loan trading on a secondary market. LendingClub screens potential borrowers and services the loans once they’re approved. The risk: Investors – not LendingClub – make the final decision whether or not to lend the money. That decision is based on the LendingClub grade, utilizing credit and income data, assigned to every approved borrower. That data, known only to the investors, also helps determine the range of interest rates offered to the borrower. LendingClub’s typical annual percentage rate (APR) is between 5.99% and 35.89%. There is also an origination fee of 1% to 6% taken off the top of the loan. Once approved, your loan amount will arrive at your bank account in about one week. There’s a monthly repayment schedule that stretches over three to five years (36-60 monthly payments). LendingClub loans are generally pursued by borrowers with good-to-excellent credit (scores average 700) and a low debt-to-income ratio (the average is 12%). Borrowers can file a joint application, which could lead to a larger loan line because of multiple incomes. LendingClub probably isn’t the best option for borrowers with bad credit. That would bring a high interest rate and steep origination fee, meaning you could probably do better with a different type of loan. See photo:
image

Here,we are seeking to understand the factors that might have signaled risky loans or borrowing practices and could be consumed or applied by prospective borrowers, lenders, and/or investors considering participating in direct P2P via the LendingClub.

1.3 Our Data

Our dataset contains over 9,500 observations of loan data from LendingClub, the largest online platform for direct P2P lending.We have obtained the dataset from Kaggle. The link for the same is: [https://www.kaggle.com/datasets/urstrulyvikas/lending-club-loan-data-analysis]

P2P Lending rose to popularity in the years of 2007 to 2015 as can be seen in the timeline: See photo:
image

So, we believe that the timeframe of 2007 to 2015 provides the most relevant data for prospective individual investors today, particularly because it is unlikely to include a significant number of large institutional lenders.

In our dataset the variables are defined as follows:

# This is copied form the Kaggle site
# We will use a kable table for simplicity
data_definitions <- data.frame(variable = c("credit.policy", "purpose", "int.rate", "installment", "log.annual.inc", "dti", "fico", "days.with.cr.line", "revol.bal", "revol.util", "inq.last.6mths", "delinq.2yrs", "pub.rec", "not.fully.paid"),
                          definition = c("1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise.",
                                         "The purpose of the loan (takes values creditcard, debtconsolidation, educational, majorpurchase, smallbusiness, and all_other).",
                                         "The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates.",
                                         "The monthly installments owed by the borrower if the loan is funded.",
                                         "The natural log of the self-reported annual income of the borrower.",
                                         "The debt-to-income ratio of the borrower (amount of debt divided by annual income).",
                                         "The FICO credit score of the borrower.",
                                         "The number of days the borrower has had a credit line.",
                                         "The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle).",
                                         "The borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available).",
                                         "The borrower's number of inquiries by creditors in the last 6 months.",
                                         "The number of times the borrower had been 30+ days past due on a payment in the past 2 years.",
                                         "The borrower's number of derogatory public records (bankruptcy filings, tax liens, or judgments).",
                                         "Whether the borrower will be fully paid or not."))

knitr::kable(data_definitions)
variable definition
credit.policy 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise.
purpose The purpose of the loan (takes values creditcard, debtconsolidation, educational, majorpurchase, smallbusiness, and all_other).
int.rate The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates.
installment The monthly installments owed by the borrower if the loan is funded.
log.annual.inc The natural log of the self-reported annual income of the borrower.
dti The debt-to-income ratio of the borrower (amount of debt divided by annual income).
fico The FICO credit score of the borrower.
days.with.cr.line The number of days the borrower has had a credit line.
revol.bal The borrower’s revolving balance (amount unpaid at the end of the credit card billing cycle).
revol.util The borrower’s revolving line utilization rate (the amount of the credit line used relative to total credit available).
inq.last.6mths The borrower’s number of inquiries by creditors in the last 6 months.
delinq.2yrs The number of times the borrower had been 30+ days past due on a payment in the past 2 years.
pub.rec The borrower’s number of derogatory public records (bankruptcy filings, tax liens, or judgments).
not.fully.paid Whether the borrower will be fully paid or not.

2 EXPLORATORY DATA ANALYSIS or EDA:

As with any analysis ours will also start with formulation of the SMART QUESTION and Subsequent EDA. By SMART QUESTION we mean a problem statement arising from out dataset that has the following characteristics: 1.Specific
2.Measurable 3.Achievable 4.Relevant 5.Time-Oriented

See photo:
image To help us structure our EDA in an efficient way we have decided that, our exploratory data analysis will closely adhere to the 9-step1 checklist presented in Chapter 4 of The Art of Data Science.2

These are the elements of our checklist:

  1. Formulate our question
  2. Read in our data
  3. Check the packaging
  4. Look at the top and the bottom of your data
  5. Check your “n”s
  6. Validate with at least one external data source
  7. Make a plot
  8. Try the easy solution first
  9. Follow up

2.1 1.Formulate our question:

Our analysis will explore things such as income-to-debt ratios, credit score, interest rates, and delinquencies among direct P2P borrowers in an attempt to understand the risks and opportunities associated with P2P. Specifically, we intend to examine the impact that these variables have on who received loans and who defaulted on their loans between 2007 and 2015. We also intend on graphically depicting how the variables are related to each other. Our primary intention is to understand the dataset and answer few of the following questions : 1. What variable or variables, if any, have an impact on if the person meets the credit underwriting criteria? How strong is that impact? 2. What variable or variables, if any, have an impact on if the person fully repays the loan? How strong is that impact? 3. Do borrowers who meet the credit underwriting criteria have a lower chance of not fully repaying the loan? If so, how big of a difference is it, and is it statistically significant?

2.2 2.Read in our data:

We start by loading the tidyverse and ezids libraries3 and reading on our dataset. Please note we also require packages corrplot and scale to efficiently conduct our EDA.If we require any other packages we will import them along the way.

library(ezids) # We will use functions form this package to get nicer looking results
library(tidyverse) # We need this package for data manipulation, piping, and graphing
library(corrplot) # We will need this package to plot the correlation matrix
library(scales) # This package will help us when labeling the scatter plots
library(gridExtra) # For additional table and image functionality
# Read in the data from the working directory
loans <- read_csv("loan_data.csv")
#loans

2.3 3.Check the packaging:

We can see that our dataset contains 9578 rows of data with 14 columns. Next let’s examine the structure of the dataset. This will give us a better understanding of what we are dealing with. This is essentially a comprehensive preview of the type of the variables in our data and one of the introductory steps of any EDA.

# There is unfortunately no ezids function to see the result in a nice looking table, so we will use the standard function.
str(loans)
## spec_tbl_df [9,578 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ credit.policy    : num [1:9578] 1 1 1 1 1 1 1 1 1 1 ...
##  $ purpose          : chr [1:9578] "debt_consolidation" "credit_card" "debt_consolidation" "debt_consolidation" ...
##  $ int.rate         : num [1:9578] 0.119 0.107 0.136 0.101 0.143 ...
##  $ installment      : num [1:9578] 829 228 367 162 103 ...
##  $ log.annual.inc   : num [1:9578] 11.4 11.1 10.4 11.4 11.3 ...
##  $ dti              : num [1:9578] 19.5 14.3 11.6 8.1 15 ...
##  $ fico             : num [1:9578] 737 707 682 712 667 727 667 722 682 707 ...
##  $ days.with.cr.line: num [1:9578] 5640 2760 4710 2700 4066 ...
##  $ revol.bal        : num [1:9578] 28854 33623 3511 33667 4740 ...
##  $ revol.util       : num [1:9578] 52.1 76.7 25.6 73.2 39.5 51 76.8 68.6 51.1 23 ...
##  $ inq.last.6mths   : num [1:9578] 0 0 1 1 0 0 0 0 1 1 ...
##  $ delinq.2yrs      : num [1:9578] 0 0 0 0 1 0 0 0 0 0 ...
##  $ pub.rec          : num [1:9578] 0 0 0 0 0 0 1 0 0 0 ...
##  $ not.fully.paid   : num [1:9578] 0 0 0 0 0 0 1 1 0 0 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   credit.policy = col_double(),
##   ..   purpose = col_character(),
##   ..   int.rate = col_double(),
##   ..   installment = col_double(),
##   ..   log.annual.inc = col_double(),
##   ..   dti = col_double(),
##   ..   fico = col_double(),
##   ..   days.with.cr.line = col_double(),
##   ..   revol.bal = col_double(),
##   ..   revol.util = col_double(),
##   ..   inq.last.6mths = col_double(),
##   ..   delinq.2yrs = col_double(),
##   ..   pub.rec = col_double(),
##   ..   not.fully.paid = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

By examining the structure of our data, we can see that there is only one character variable which might be a factor, and some of the numeric variables look like logicals.

2.4 4.Look at the top and the bottom of your data:

We can also look at the top and bottom rows of our dataset to get a better feel for the data. This will help us better understand the values in our dataset and how to most effectively deal with them.

The first 5 observations of our dataset are as follows:

# We use the xkabledplyhead() function form the ezids package to see the result in a nice looking table.
xkabledplyhead(loans)
Head
credit.policy purpose int.rate installment log.annual.inc dti fico days.with.cr.line revol.bal revol.util inq.last.6mths delinq.2yrs pub.rec not.fully.paid
1 debt_consolidation 0.1189 829.10 11.3504 19.48 737 5639.958 28854 52.1 0 0 0 0
1 credit_card 0.1071 228.22 11.0821 14.29 707 2760.000 33623 76.7 0 0 0 0
1 debt_consolidation 0.1357 366.86 10.3735 11.63 682 4710.000 3511 25.6 1 0 0 0
1 debt_consolidation 0.1008 162.34 11.3504 8.10 712 2699.958 33667 73.2 1 0 0 0
1 credit_card 0.1426 102.92 11.2997 14.97 667 4066.000 4740 39.5 0 1 0 0

The bottom 5 observations of our dataset are as follows:

# We use the xkabledplytail() function form the ezids package to see the result in a nice looking table.
xkabledplytail(loans)
Tail
credit.policy purpose int.rate installment log.annual.inc dti fico days.with.cr.line revol.bal revol.util inq.last.6mths delinq.2yrs pub.rec not.fully.paid
0 all_other 0.1461 344.76 12.1808 10.39 672 10474.000 215372 82.1 2 0 0 1
0 all_other 0.1253 257.70 11.1419 0.21 722 4380.000 184 1.1 5 0 0 1
0 debt_consolidation 0.1071 97.81 10.5966 13.09 687 3450.042 10036 82.9 8 0 0 1
0 home_improvement 0.1600 351.58 10.8198 19.18 692 1800.000 0 3.2 5 0 0 1
0 debt_consolidation 0.1392 853.43 11.2645 16.28 732 4740.000 37879 57.0 6 0 0 1

The top and bottom rows of our dataset indicate the data is structured in an acceptable way and that our variables match up with the values for each column. This means we can now move on to the descriptive statistics part of our EDA.

2.5 5.Check your “n”s:

We can get some descriptive statistics of the variables to help us better understand the data. The xkablesummary() gives us the 5 point summary of the dataset and the mean.

# We use the xkablesummary() function from the ezids package to see the result in a nice looking table.
xkablesummary(loans)
Table: Statistics summary.
credit.policy purpose int.rate installment log.annual.inc dti fico days.with.cr.line revol.bal revol.util inq.last.6mths delinq.2yrs pub.rec not.fully.paid
Min Min. :0.000 Length:9578 Min. :0.0600 Min. : 15.67 Min. : 7.548 Min. : 0.000 Min. :612.0 Min. : 179 Min. : 0 Min. : 0.0 Min. : 0.000 Min. : 0.0000 Min. :0.00000 Min. :0.0000
Q1 1st Qu.:1.000 Class :character 1st Qu.:0.1039 1st Qu.:163.77 1st Qu.:10.558 1st Qu.: 7.213 1st Qu.:682.0 1st Qu.: 2820 1st Qu.: 3187 1st Qu.: 22.6 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.:0.00000 1st Qu.:0.0000
Median Median :1.000 Mode :character Median :0.1221 Median :268.95 Median :10.929 Median :12.665 Median :707.0 Median : 4140 Median : 8596 Median : 46.3 Median : 1.000 Median : 0.0000 Median :0.00000 Median :0.0000
Mean Mean :0.805 NA Mean :0.1226 Mean :319.09 Mean :10.932 Mean :12.607 Mean :710.8 Mean : 4561 Mean : 16914 Mean : 46.8 Mean : 1.577 Mean : 0.1637 Mean :0.06212 Mean :0.1601
Q3 3rd Qu.:1.000 NA 3rd Qu.:0.1407 3rd Qu.:432.76 3rd Qu.:11.291 3rd Qu.:17.950 3rd Qu.:737.0 3rd Qu.: 5730 3rd Qu.: 18250 3rd Qu.: 70.9 3rd Qu.: 2.000 3rd Qu.: 0.0000 3rd Qu.:0.00000 3rd Qu.:0.0000
Max Max. :1.000 NA Max. :0.2164 Max. :940.14 Max. :14.528 Max. :29.960 Max. :827.0 Max. :17640 Max. :1207359 Max. :119.0 Max. :33.000 Max. :13.0000 Max. :5.00000 Max. :1.0000

From this we can see that some of the variables that appeared to be logicals, like inq.last.6mths, delinq.2yrs, and pub.rec are actually not. However,credit.policy and not.fully.paid is.

We have an idea of what to expect for a few variables, such as interest rate and credit score, so we were able to test the dataset against some of our expectations to gauge its reliability. By inspecting the dataframe, we can see that interest rates for the data are between 6% and 21.64% and credit scores range from 612 to 827. Although interest rates might seem to reach excessively high rates or credit scores too meager, the P2P market tended to consist of more risky loans. This aligned with our expectation and reinforced our confidence in the dataset.

The range of the utilization, or the percent of credit being used, is between 0% and 119%. Someone utilizing more than 100% of the credit available to them initially seemed erroneous; however, this can occur from technical error, creditors and collectors reporting at different date/times, borrowers opening and closing credit lines, or possibly when borrowers appear as authorized users of others’ credit lines. Regardless, only 27 loans within our dataset appear to exceed the standard maximum of 100% so we do not expect this to have a significant effect on our analysis, thereby allowing us to move on to the next step of our EDA.

But, before we move on we wanted to see the a measure of dispersion/variation namely standard deviation for the numeric variables as a part of descriptive statistics section of our EDA. They are as follows:

library(expss)

tab = loans %>%
   tab_cells(credit.policy,int.rate, installment, log.annual.inc, dti, fico ,days.with.cr.line,revol.bal,revol.util,inq.last.6mths,delinq.2yrs,pub.rec,not.fully.paid) %>%
  tab_cols() %>% 
    tab_stat_sd(label = "Std. dev.") %>%
    tab_pivot() %>% 
    set_caption("Table with Standard Deviation of all numeric variables.")
tab
Table with Standard Deviation of all numeric variables.
 #Total 
 credit.policy 
   Std. dev.  0.4
 int.rate 
   Std. dev.  0.0
 installment 
   Std. dev.  207.1
 log.annual.inc 
   Std. dev.  0.6
 dti 
   Std. dev.  6.9
 fico 
   Std. dev.  38.0
 days.with.cr.line 
   Std. dev.  2496.9
 revol.bal 
   Std. dev.  33756.2
 revol.util 
   Std. dev.  29.0
 inq.last.6mths 
   Std. dev.  2.2
 delinq.2yrs 
   Std. dev.  0.5
 pub.rec 
   Std. dev.  0.3
 not.fully.paid 
   Std. dev.  0.4
png("sd.png", height=700, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2

2.6 6.Validate with at least one external data source:

According to the Kaggle site where we got this dataset from, there are 9,578 rows and 14 columns, which matches what we have. The site also shows that there is no missing data. Let’s verify that by adding the total number of missing cells in the dataset, which is 0, and check the total number of null cells, which is 0. We can also check if the observations are unique, and we see that all 9578 rows are unique.

2.7 7.Make a plot:

Let’s start by making a histogram for each non-logical numeric variable.

# By gathering the variables we want to see into a long format with the gather() function, we can then create a histogram
# for each variable using the facet_wrap() function in ggplot2.
loans %>%
  select(int.rate, installment, log.annual.inc, dti, fico, days.with.cr.line, revol.bal, revol.util,
           inq.last.6mths, delinq.2yrs, pub.rec) %>%
  gather(variable, value) %>%
  ggplot(aes(x = value)) +
  geom_histogram(fill = "steelblue", color = "black") +
  facet_wrap(~ variable, scales = "free") + # Free scales so the graphs are readable
  labs(title = "Histograms of Numeric Variables", x = "Value", y = "Count") +
  theme_minimal()

Some of these variables look somewhat normal, and it would make sense to create a QQ-Plot for them later. But first let’s create boxplots for these same variables.

# By gathering the variables we want to see into a long format with the gather() function, we can then create a boxplot
# for each variable using the facet_wrap() function in ggplot2.
loans %>%
  select(int.rate, installment, log.annual.inc, dti, fico, days.with.cr.line, revol.bal, revol.util,
           inq.last.6mths, delinq.2yrs, pub.rec) %>%
  gather(variable, value) %>%
  ggplot(aes(x = value)) +
  geom_boxplot(fill = "steelblue", color = "black",
               outlier.size = 2, outlier.alpha = 0.2) + # Translucent and larger outliers to help with overplotting
  facet_wrap(~ variable, scales = "free") + # Free scales so the graphs are readable
  labs(title = "Boxplots of Numeric Variables", x = "Value") +
  theme_minimal() +
  theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())

We can see that some of these variables have issues with outliers.

Now let’s look at the Factor and logical variables with bar charts.

# By gathering the variables we want to see into a long format with the gather() function, we can then create a bar graph
# for each variable using the facet_wrap() function in ggplot2.
loans %>%
  select(credit.policy, purpose, not.fully.paid) %>%
  gather(variable, value) %>%
  ggplot(aes(x = value)) +
  geom_bar(fill = "steelblue", color = "black") +
  facet_wrap(~ variable, scales = "free") + # Free scales so the graphs are readable
  coord_flip() +
  labs(title = "Bar Charts of Non-Numeric Variables", x = "Value", y = "Count") +
  theme_minimal() +
  theme()

From this we can see that the purpose variable would be a good candidate to perform ANOVA tests on.

2.8 Try the easy solution first

Let’s try the easy way of looking at meeting the credit underwriting criteria vs the borrower fully paying. We can group by the credit.policy variable, and calculate the percentage of borrowers in each category who did not fully pay.

# We will convert the average to an easier to read percentage by multiplying by 100, rounding, and adding a "%" at the end.
loans %>%
  group_by(credit.policy) %>%
  summarize(percent_not_fully_paid = paste0(round(100*mean(not.fully.paid), 1), "%"))  %>%
  ungroup() %>%
  knitr::kable(align = "c")
credit.policy percent_not_fully_paid
0 27.8%
1 13.2%

From this we can see that about 13.2% of borrowers who met the credit underwriting criteria did not fully pay, while for the borrowers who did not meet the credit underwriting criteria about 27.8% did not fully pay.

This indicates borrowers who did not meet the credit underwriting criteria were almost twice as likely to be default on their loans than those who did meet the criteria. For comparison, default rates on loans from commercial banks for the same period as our dataset averaged 4.48%, with a maximum default rate of 7.49% default rate towards the end of 2009, according to the St. Louis Federal Reserve Bank.4

2.9 Follow up

Let us go back to the table of data definitions and add a column for the variable type.

# Add a type column and reorder it so that definition is last
data_definitions_augmented <- data_definitions %>%
  mutate(type = c("Logical", "Factor", "Numeric", "Numeric", "Numeric", "Numeric", "Integer", "Numeric", "Integer", "Numeric", "Integer", "Integer", "Integer", "Logical")) %>%
  select(variable, type, definition)
# We will use a kable table for simplicity
knitr::kable(data_definitions_augmented)
variable type definition
credit.policy Logical 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise.
purpose Factor The purpose of the loan (takes values creditcard, debtconsolidation, educational, majorpurchase, smallbusiness, and all_other).
int.rate Numeric The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates.
installment Numeric The monthly installments owed by the borrower if the loan is funded.
log.annual.inc Numeric The natural log of the self-reported annual income of the borrower.
dti Numeric The debt-to-income ratio of the borrower (amount of debt divided by annual income).
fico Integer The FICO credit score of the borrower.
days.with.cr.line Numeric The number of days the borrower has had a credit line.
revol.bal Integer The borrower’s revolving balance (amount unpaid at the end of the credit card billing cycle).
revol.util Numeric The borrower’s revolving line utilization rate (the amount of the credit line used relative to total credit available).
inq.last.6mths Integer The borrower’s number of inquiries by creditors in the last 6 months.
delinq.2yrs Integer The number of times the borrower had been 30+ days past due on a payment in the past 2 years.
pub.rec Integer The borrower’s number of derogatory public records (bankruptcy filings, tax liens, or judgments).
not.fully.paid Logical Whether the borrower will be fully paid or not.

Let’s also convert some of the variables to a more appropriate type. the credit.policy and not.fully.paid variables are logicals, and the purpose variable we will use as a factor.

# These variables may act differently from here on out
loans$credit.policy <- as.logical(loans$credit.policy)
loans$not.fully.paid <- as.logical(loans$not.fully.paid)

loans$purpose <- as.factor(loans$purpose)

Now we can further explore the data to see how the numeric variables differ based on the on the credit.policy, not.fully.paid, and purpose variables. Let’s make some boxplots to visualize this.

2.9.1 Additional Boxplots

# By gathering the variables we want to see into a long format with the gather() function, we can then create a boxplot
# for each variable using the facet_wrap() function in ggplot2. We can see this for each credit policy value by excluding
# it in the gather() function.
loans %>%
  select(int.rate, installment, log.annual.inc, dti, fico, days.with.cr.line, revol.bal, revol.util,
           inq.last.6mths, delinq.2yrs, pub.rec, credit.policy) %>%
  gather(variable, value, -credit.policy) %>%
  ggplot(aes(x = value, y = as.logical(credit.policy), fill = as.logical(credit.policy))) +
  geom_boxplot(outlier.size = 2, outlier.alpha = 0.2) +  # Translucent and larger outliers to help with overplotting
  guides(fill = guide_legend(reverse = TRUE)) + # So the legend order matches the order in the graphs
  facet_wrap(~ variable, scales = "free_x") + # Free x scale so the graphs are readable
  labs(title = "Boxplots of Numeric Variables", subtitle = "Comparing `credit.policy` Values",
       x = "Value", y = "Count", fill = "Credit Policy") +
  theme_minimal()

# By gathering the variables we want to see into a long format with the gather() function, we can then create a boxplot
# for each variable using the facet_wrap() function in ggplot2. We can see this for each not fully paid value by excluding
# it in the gather() function.
loans %>%
  select(int.rate, installment, log.annual.inc, dti, fico, days.with.cr.line, revol.bal, revol.util,
           inq.last.6mths, delinq.2yrs, pub.rec, not.fully.paid) %>%
  gather(variable, value, -not.fully.paid) %>%
  ggplot(aes(x = value, y = as.logical(not.fully.paid), fill = as.logical(not.fully.paid))) +
  geom_boxplot(outlier.size = 2, outlier.alpha = 0.2) +  # Translucent and larger outliers to help with overplotting
  guides(fill = guide_legend(reverse = TRUE)) + # So the legend order matches the order in the graphs
  facet_wrap(~ variable, scales = "free_x") + # Free x scale so the graphs are readable
  labs(title = "Boxplots of Numeric Variables", subtitle = "Comparing `not.fully.paid` Values",
       x = "Value", y = "Count", fill = "Not Fully Paid") +
  theme_minimal()

# By gathering the variables we want to see into a long format with the gather() function, we can then create a boxplot
# for each variable using the facet_wrap() function in ggplot2. We can see this for each purpose value by excluding
# it in the gather() function.
loans %>%
  select(int.rate, installment, log.annual.inc, dti, fico, days.with.cr.line, revol.bal, revol.util,
           inq.last.6mths, delinq.2yrs, pub.rec, purpose) %>%
  gather(variable, value, -purpose) %>%
  ggplot(aes(x = value, y = purpose, fill = purpose)) +
  geom_boxplot(outlier.size = 2, outlier.alpha = 0.2) +
  guides(fill = guide_legend(reverse = TRUE)) + # So the legend order matches the order in the graphs
  facet_wrap(~ variable, scales = "free_x") + # Free x scale so the graphs are readable
  labs(title = "Boxplots of Numeric Variables", subtitle = "Comparing `purpose` Values",
       x = "Value", y = "Count", fill = "Purpose") +
  theme_minimal()

2.9.2 Covariance Matrix

loans_covarience_matrix <- loans %>%
  select(-purpose) %>%
  cov()
loans_covarience_matrix
##                   credit.policy      int.rate   installment log.annual.inc
## credit.policy      1.570099e-01 -0.0031285128  4.822100e+00   8.503673e-03
## int.rate          -3.128513e-03  0.0007207607  1.535130e+00   9.306428e-04
## installment        4.822100e+00  1.5351296712  4.287852e+04   5.704792e+01
## log.annual.inc     8.503673e-03  0.0009306428  5.704792e+01   3.779947e-01
## dti               -2.479528e-01  0.0406600857  7.156135e+01  -2.288211e-01
## fico               5.240672e+00 -0.7286843825  6.764941e+02   2.674749e+00
## days.with.cr.line  9.797606e+01 -8.3138330083  9.477258e+04   5.171847e+02
## revol.bal         -2.508193e+03 83.8528268415  1.633027e+06   7.723287e+03
## revol.util        -1.196760e+00  0.3620848507  4.887925e+02   9.789922e-01
## inq.last.6mths    -4.668777e-01  0.0119782213 -4.746828e+00   3.946114e-02
## delinq.2yrs       -1.651796e-02  0.0022887736 -4.940054e-01   9.807039e-03
## pub.rec           -5.634017e-03  0.0006907971 -1.778157e+00   2.660161e-03
## not.fully.paid    -2.297364e-02  0.0015706471  3.792995e+00  -7.538466e-03
##                             dti          fico days.with.cr.line     revol.bal
## credit.policy     -2.479528e-01  5.240672e+00      9.797606e+01 -2.508193e+03
## int.rate           4.066009e-02 -7.286844e-01     -8.313833e+00  8.385283e+01
## installment        7.156135e+01  6.764941e+02      9.477258e+04  1.633027e+06
## log.annual.inc    -2.288211e-01  2.674749e+00      5.171847e+02  7.723287e+03
## dti                4.738904e+01 -6.304443e+01      1.033066e+03  4.386056e+04
## fico              -6.304443e+01  1.441762e+03      2.501838e+04 -1.993428e+04
## days.with.cr.line  1.033066e+03  2.501838e+04      6.234661e+06  1.933070e+07
## revol.bal          4.386056e+04 -1.993428e+04      1.933070e+07  1.139480e+09
## revol.util         6.733229e+01 -5.963347e+02     -1.756061e+03  1.995845e+05
## inq.last.6mths     4.421091e-01 -1.548021e+01     -2.292940e+02  1.663280e+03
## delinq.2yrs       -8.194136e-02 -4.486898e+00      1.109825e+02 -6.129401e+02
## pub.rec            1.120352e-02 -1.468994e+00      4.701103e+01 -2.743852e+02
## not.fully.paid     9.430733e-02 -2.083784e+00     -2.676802e+01  6.646676e+02
##                      revol.util inq.last.6mths   delinq.2yrs       pub.rec
## credit.policy     -1.196760e+00    -0.46687768 -1.651796e-02 -5.634017e-03
## int.rate           3.620849e-01     0.01197822  2.288774e-03  6.907971e-04
## installment        4.887925e+02    -4.74682830 -4.940054e-01 -1.778157e+00
## log.annual.inc     9.789922e-01     0.03946114  9.807039e-03  2.660161e-03
## dti                6.733229e+01     0.44210915 -8.194136e-02  1.120352e-02
## fico              -5.963347e+02   -15.48020939 -4.486898e+00 -1.468994e+00
## days.with.cr.line -1.756061e+03  -229.29404099  1.109825e+02  4.701103e+01
## revol.bal          1.995845e+05  1663.28033876 -6.129401e+02 -2.743852e+02
## revol.util         8.418364e+02    -0.88607632 -6.773480e-01  5.074089e-01
## inq.last.6mths    -8.860763e-01     4.84107945  2.553287e-02  4.191352e-02
## delinq.2yrs       -6.773480e-01     0.02553287  2.983507e-01  1.314967e-03
## pub.rec            5.074089e-01     0.04191352  1.314967e-03  6.871021e-02
## not.fully.paid     8.733217e-01     0.12057426  1.778727e-03  4.674501e-03
##                   not.fully.paid
## credit.policy       -0.022973644
## int.rate             0.001570647
## installment          3.792994623
## log.annual.inc      -0.007538466
## dti                  0.094307333
## fico                -2.083784075
## days.with.cr.line  -26.768023630
## revol.bal          664.667576358
## revol.util           0.873321666
## inq.last.6mths       0.120574263
## delinq.2yrs          0.001778727
## pub.rec              0.004674501
## not.fully.paid       0.134450952
png("loans_covarience_matrix.png", height=2000, width=2000)
p<-tableGrob(loans_covarience_matrix)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2

2.9.3 Correlation Matrix and Heatmap

# For our correlation matrix we want to include everything but the purpose variable
loans_correlation_matrix <- loans %>%
  select(-purpose) %>%
  cor()
loans_correlation_matrix
png("loans_correlation_matrix.png", height=2000, width=2000)
p<-tableGrob(loans_correlation_matrix)
grid.arrange(p)
dev.off()

# The mixed correlation plot makes a nice visualization
corrplot.mixed(loans_correlation_matrix, tl.pos = "lt")

2.9.4 Scatter Plots

loans %>%
  ggplot(aes(x = fico, y = int.rate)) +
  geom_point(color = "steelblue", alpha = 0.2) +
  labs(title = "Interest Rate vs FICO Score",
       x = "FICO Score", y = "Interest Rate") +
  scale_x_continuous(limits = c(600, NA), expand = expansion(mult = c(0, .05))) +
  scale_y_continuous(labels = label_percent(), limits = c(.05, NA), expand = expansion(mult = c(0, .05))) +
  theme_minimal()

loans %>%
  ggplot(aes(x = int.rate, y = revol.util)) +
  geom_point(color = "steelblue", alpha = 0.2) +
  labs(title = "Revolving Line Utilization Rate vs Interest Rate",
       x = "Interest Rate", y = "Revolving Line Utilization Rate") +
  scale_x_continuous(labels = label_percent(), limits = c(.05, NA), expand = expansion(mult = c(0, .05))) +
  scale_y_continuous(labels = label_percent(scale = 1)) +
  theme_minimal()

loans %>%
  ggplot(aes(x = log.annual.inc, y = installment)) +
  geom_point(color = "steelblue", alpha = 0.2) +
  labs(title = "Installment vs Log of Annual Income",
       x = "Log of Annual Income", y = "Installment") +
  theme_minimal()

loans %>%
  ggplot(aes(x = fico, y = revol.util)) +
  geom_point(color = "steelblue", alpha = 0.2) +
  labs(title = "Revolving Line Utilization Rate vs FICO Score",
       x = "FICO Score", y = "Revolving Line Utilization Rate") +
  scale_x_continuous(limits = c(600, NA), expand = expansion(mult = c(0, .05))) +
  scale_y_continuous(labels = label_percent(scale = 1)) +
  theme_minimal()

3 STATISTICAL TEST:

Interpretation5 6 7 8 9 10

3.1 T-Tests

A t-test is a statistical test that compares the means of two samples. It is used in hypothesis testing, with a null hypothesis that the difference in group means is zero and an alternate hypothesis that the difference in group means is different from zero. We can perform t-tests for each numeric variable in our dataset. Not all of the results may be useful, but we want to have them available for further consideration.

======= ## T-tests:

library(broom)
library(purrr)

ttest95rate = t.test(x=loans$int.rate) # default conf.level = 0.95
ttest99rate = t.test(x=loans$int.rate, conf.level=0.99 )
ttest50rate = t.test(x=loans$int.rate, conf.level=0.50 )

tab <- map_df(list(ttest95rate, ttest99rate, ttest50rate),tidy)
tab
## # A tibble: 3 × 8
##   estimate statistic p.value parameter conf.low conf.high method         alter…¹
##      <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>          <chr>  
## 1    0.123      447.       0      9577    0.122     0.123 One Sample t-… two.si…
## 2    0.123      447.       0      9577    0.122     0.123 One Sample t-… two.si…
## 3    0.123      447.       0      9577    0.122     0.123 One Sample t-… two.si…
## # … with abbreviated variable name ¹​alternative
png("t1.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ttest95installment = t.test(x=loans$installment) # default conf.level = 0.95
ttest99installment = t.test(x=loans$installment, conf.level=0.99 )
ttest50installment = t.test(x=loans$installment, conf.level=0.50 )

tab <- map_df(list(ttest95installment,ttest99installment,ttest50installment), tidy)
tab
## # A tibble: 3 × 8
##   estimate statistic p.value parameter conf.low conf.high method         alter…¹
##      <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>          <chr>  
## 1     319.      151.       0      9577     315.      323. One Sample t-… two.si…
## 2     319.      151.       0      9577     314.      325. One Sample t-… two.si…
## 3     319.      151.       0      9577     318.      321. One Sample t-… two.si…
## # … with abbreviated variable name ¹​alternative
png("t2.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ttest95annual = t.test(x=loans$log.annual.inc) # default conf.level = 0.95
ttest99annual = t.test(x=loans$log.annual.inc, conf.level=0.99 )
ttest50annual = t.test(x=loans$log.annual.inc, conf.level=0.50 )

tab <- map_df(list(ttest95annual,ttest99annual,ttest50annual), tidy)
tab
## # A tibble: 3 × 8
##   estimate statistic p.value parameter conf.low conf.high method         alter…¹
##      <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>          <chr>  
## 1     10.9     1740.       0      9577     10.9      10.9 One Sample t-… two.si…
## 2     10.9     1740.       0      9577     10.9      10.9 One Sample t-… two.si…
## 3     10.9     1740.       0      9577     10.9      10.9 One Sample t-… two.si…
## # … with abbreviated variable name ¹​alternative
png("t3.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ttest95fico = t.test(x=loans$fico) # default conf.level = 0.95
ttest99fico = t.test(x=loans$fico, conf.level=0.99 )
ttest50fico = t.test(x=loans$fico, conf.level=0.50 )

tab <- map_df(list(ttest95fico,ttest99fico,ttest50fico), tidy)
tab
## # A tibble: 3 × 8
##   estimate statistic p.value parameter conf.low conf.high method         alter…¹
##      <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>          <chr>  
## 1     711.     1832.       0      9577     710.      712. One Sample t-… two.si…
## 2     711.     1832.       0      9577     710.      712. One Sample t-… two.si…
## 3     711.     1832.       0      9577     711.      711. One Sample t-… two.si…
## # … with abbreviated variable name ¹​alternative
png("t4.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ttest95dti = t.test(x=loans$dti) # default conf.level = 0.95
ttest99dti = t.test(x=loans$dti, conf.level=0.99 )
ttest50dti = t.test(x=loans$dti, conf.level=0.50 )

tab <- map_df(list(ttest95dti,ttest99dti,ttest50dti), tidy)
tab
## # A tibble: 3 × 8
##   estimate statistic p.value parameter conf.low conf.high method         alter…¹
##      <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>          <chr>  
## 1     12.6      179.       0      9577     12.5      12.7 One Sample t-… two.si…
## 2     12.6      179.       0      9577     12.4      12.8 One Sample t-… two.si…
## 3     12.6      179.       0      9577     12.6      12.7 One Sample t-… two.si…
## # … with abbreviated variable name ¹​alternative
png("t5.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ttest95days.with.cr.line = t.test(x=loans$days.with.cr.line) # default conf.level = 0.95
ttest99days.with.cr.line = t.test(x=loans$days.with.cr.line, conf.level=0.99 )
ttest50days.with.cr.line = t.test(x=loans$days.with.cr.line, conf.level=0.50 )


tab <- map_df(list(ttest95days.with.cr.line,ttest99days.with.cr.line,ttest50days.with.cr.line), tidy)
tab
## # A tibble: 3 × 8
##   estimate statistic p.value parameter conf.low conf.high method         alter…¹
##      <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>          <chr>  
## 1    4561.      179.       0      9577    4511.     4611. One Sample t-… two.si…
## 2    4561.      179.       0      9577    4495.     4626. One Sample t-… two.si…
## 3    4561.      179.       0      9577    4544.     4578. One Sample t-… two.si…
## # … with abbreviated variable name ¹​alternative
png("t6.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ttest95revol.bal = t.test(x=loans$revol.bal) # default conf.level = 0.95
ttest99revol.bal = t.test(x=loans$revol.bal, conf.level=0.99 )
ttest50revol.bal = t.test(x=loans$revol.bal, conf.level=0.50 )

tab <- map_df(list(ttest95revol.bal,ttest99revol.bal,ttest50revol.bal), tidy)
tab
## # A tibble: 3 × 8
##   estimate statistic p.value parameter conf.low conf.high method         alter…¹
##      <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>          <chr>  
## 1   16914.      49.0       0      9577   16238.    17590. One Sample t-… two.si…
## 2   16914.      49.0       0      9577   16025.    17803. One Sample t-… two.si…
## 3   16914.      49.0       0      9577   16681.    17147. One Sample t-… two.si…
## # … with abbreviated variable name ¹​alternative
png("t7.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ttest95revol.util = t.test(x=loans$revol.util) # default conf.level = 0.95
ttest99revol.util = t.test(x=loans$revol.util, conf.level=0.99 )
ttest50revol.util = t.test(x=loans$revol.util, conf.level=0.50 )

tab <- map_df(list(ttest95revol.util,ttest99revol.util,ttest50revol.util), tidy)
tab
## # A tibble: 3 × 8
##   estimate statistic p.value parameter conf.low conf.high method         alter…¹
##      <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>          <chr>  
## 1     46.8      158.       0      9577     46.2      47.4 One Sample t-… two.si…
## 2     46.8      158.       0      9577     46.0      47.6 One Sample t-… two.si…
## 3     46.8      158.       0      9577     46.6      47.0 One Sample t-… two.si…
## # … with abbreviated variable name ¹​alternative
png("t8.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ttest95inq.last.6mths = t.test(x=loans$inq.last.6mths) # default conf.level = 0.95
ttest99inq.last.6mths = t.test(x=loans$inq.last.6mths, conf.level=0.99 )
ttest50inq.last.6mths = t.test(x=loans$inq.last.6mths, conf.level=0.50 )

tab <- map_df(list(ttest95inq.last.6mths,ttest99inq.last.6mths,ttest50inq.last.6mths), tidy)
tab
## # A tibble: 3 × 8
##   estimate statistic p.value parameter conf.low conf.high method         alter…¹
##      <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>          <chr>  
## 1     1.58      70.2       0      9577     1.53      1.62 One Sample t-… two.si…
## 2     1.58      70.2       0      9577     1.52      1.64 One Sample t-… two.si…
## 3     1.58      70.2       0      9577     1.56      1.59 One Sample t-… two.si…
## # … with abbreviated variable name ¹​alternative
png("t9.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ttest95delinq.2yrs = t.test(x=loans$delinq.2yrs) # default conf.level = 0.95
ttest99delinq.2yrs = t.test(x=loans$delinq.2yrs, conf.level=0.99 )
ttest50delinq.2yrs = t.test(x=loans$delinq.2yrs, conf.level=0.50 )

tab <- map_df(list(ttest95delinq.2yrs,ttest99delinq.2yrs,ttest50delinq.2yrs), tidy)
tab
## # A tibble: 3 × 8
##   estimate statistic   p.value parameter conf.low conf.high method       alter…¹
##      <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>        <chr>  
## 1    0.164      29.3 3.51e-181      9577    0.153     0.175 One Sample … two.si…
## 2    0.164      29.3 3.51e-181      9577    0.149     0.178 One Sample … two.si…
## 3    0.164      29.3 3.51e-181      9577    0.160     0.167 One Sample … two.si…
## # … with abbreviated variable name ¹​alternative
png("t10.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ttest95pub.rec = t.test(x=loans$pub.rec) # default conf.level = 0.95
ttest99pub.rec = t.test(x=loans$pub.rec, conf.level=0.99 )
ttest50pub.rec = t.test(x=loans$pub.rec, conf.level=0.50 )

tab <- map_df(list(ttest95pub.rec,ttest99pub.rec,ttest50pub.rec), tidy)
tab
## # A tibble: 3 × 8
##   estimate statistic   p.value parameter conf.low conf.high method       alter…¹
##      <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>        <chr>  
## 1   0.0621      23.2 7.89e-116      9577   0.0569    0.0674 One Sample … two.si…
## 2   0.0621      23.2 7.89e-116      9577   0.0552    0.0690 One Sample … two.si…
## 3   0.0621      23.2 7.89e-116      9577   0.0603    0.0639 One Sample … two.si…
## # … with abbreviated variable name ¹​alternative
png("t11.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2

3.2 ANOVA Tests

ANOVA stands for Analysis of Variance. It’s a statistical test that was developed by Ronald Fisher in 1918 and has been in use ever since. Put simply, ANOVA tells you if there are any statistical differences between the means of three or more independent groups. One-way ANOVA is the most basic form. Tukey’s Honest Significant Difference (HSD) test is a post hoc test commonly used to assess the significance of differences between pairs of group means. Tukey HSD is often a follow up to one-way ANOVA, when the F-test has revealed the existence of a significant difference between some of the tested groups.. The purpose variable in our data is a good candidate to perform ANOVA tests with. We will do an ANOVA test for each variable based on purpose. We will also be doing Tuckey tests here. Again not all of the results may be useful, but we want to have them available for further consideration.

aovrate=aov(int.rate ~ purpose, data = loans)
aovratesummary=summary(aovrate)
aovratesummary
##               Df Sum Sq Mean Sq F value Pr(>F)    
## purpose        6  0.351 0.05850   85.46 <2e-16 ***
## Residuals   9571  6.552 0.00068                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aovrateturkey=TukeyHSD(aovrate)
aovrateturkey
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = int.rate ~ purpose, data = loans)
## 
## $purpose
##                                              diff           lwr           upr
## credit_card-all_other                0.0029676657  0.0002712055  0.0056641258
## debt_consolidation-all_other         0.0098244685  0.0078099637  0.0118389734
## educational-all_other                0.0031367610 -0.0013252303  0.0075987522
## home_improvement-all_other           0.0007359906 -0.0027307042  0.0042026854
## major_purchase-all_other            -0.0025995895 -0.0066215462  0.0014223673
## small_business-all_other             0.0213165483  0.0178278713  0.0248052252
## debt_consolidation-credit_card       0.0068568029  0.0043625116  0.0093510941
## educational-credit_card              0.0001690953 -0.0045290559  0.0048672465
## home_improvement-credit_card        -0.0022316751 -0.0059974727  0.0015341226
## major_purchase-credit_card          -0.0055672551 -0.0098497071 -0.0012848031
## small_business-credit_card           0.0183488826  0.0145628390  0.0221349262
## educational-debt_consolidation      -0.0066877076 -0.0110305128 -0.0023449023
## home_improvement-debt_consolidation -0.0090884779 -0.0124003602 -0.0057765957
## major_purchase-debt_consolidation   -0.0124240580 -0.0163133674 -0.0085347486
## small_business-debt_consolidation    0.0114920797  0.0081571947  0.0148269648
## home_improvement-educational        -0.0024007703 -0.0075795444  0.0027780037
## major_purchase-educational          -0.0057363504 -0.0113021266 -0.0001705743
## small_business-educational           0.0181797873  0.0129862726  0.0233733020
## major_purchase-home_improvement     -0.0033355801 -0.0081404183  0.0014692582
## small_business-home_improvement      0.0205805576  0.0162123542  0.0249487611
## small_business-major_purchase        0.0239161377  0.0190954153  0.0287368602
##                                         p adj
## credit_card-all_other               0.0201496
## debt_consolidation-all_other        0.0000000
## educational-all_other               0.3688491
## home_improvement-all_other          0.9959908
## major_purchase-all_other            0.4762263
## small_business-all_other            0.0000000
## debt_consolidation-credit_card      0.0000000
## educational-credit_card             0.9999999
## home_improvement-credit_card        0.5837844
## major_purchase-credit_card          0.0024341
## small_business-credit_card          0.0000000
## educational-debt_consolidation      0.0001153
## home_improvement-debt_consolidation 0.0000000
## major_purchase-debt_consolidation   0.0000000
## small_business-debt_consolidation   0.0000000
## home_improvement-educational        0.8194333
## major_purchase-educational          0.0383509
## small_business-educational          0.0000000
## major_purchase-home_improvement     0.3848074
## small_business-home_improvement     0.0000000
## small_business-major_purchase       0.0000000
aovinstallment=aov(installment ~ purpose, data = loans)
aovinstallmentsummary=summary(aovinstallment)
aovinstallmentsummary
##               Df    Sum Sq Mean Sq F value Pr(>F)    
## purpose        6  33502096 5583683   141.7 <2e-16 ***
## Residuals   9571 377145527   39405                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aovinstallmentturkey=TukeyHSD(aovinstallment)
aovinstallmentturkey
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = installment ~ purpose, data = loans)
## 
## $purpose
##                                            diff        lwr         upr
## credit_card-all_other                 74.563171   54.10481   95.021537
## debt_consolidation-all_other         114.046848   98.76256  129.331137
## educational-all_other                -27.390341  -61.24400    6.463320
## home_improvement-all_other            92.134048   65.83182  118.436275
## major_purchase-all_other              -1.453629  -31.96870   29.061438
## small_business-all_other             188.889066  162.42006  215.358074
## debt_consolidation-credit_card        39.483677   20.55919   58.408162
## educational-credit_card             -101.953512 -137.59895  -66.308077
## home_improvement-credit_card          17.570877  -11.00068   46.142433
## major_purchase-credit_card           -76.016800 -108.50828  -43.525325
## small_business-credit_card           114.325894   85.60073  143.051059
## educational-debt_consolidation      -141.437189 -174.38657 -108.487806
## home_improvement-debt_consolidation  -21.912800  -47.04045    3.214846
## major_purchase-debt_consolidation   -115.500477 -145.00913  -85.991822
## small_business-debt_consolidation     74.842218   49.54005  100.144389
## home_improvement-educational         119.524389   80.23241  158.816366
## major_purchase-educational            25.936712  -16.29150   68.164920
## small_business-educational           216.279406  176.87559  255.683222
## major_purchase-home_improvement      -93.587677 -130.04256  -57.132795
## small_business-home_improvement       96.755018   63.61294  129.897099
## small_business-major_purchase        190.342694  153.76730  226.918091
##                                         p adj
## credit_card-all_other               0.0000000
## debt_consolidation-all_other        0.0000000
## educational-all_other               0.2045045
## home_improvement-all_other          0.0000000
## major_purchase-all_other            0.9999994
## small_business-all_other            0.0000000
## debt_consolidation-credit_card      0.0000000
## educational-credit_card             0.0000000
## home_improvement-credit_card        0.5388086
## major_purchase-credit_card          0.0000000
## small_business-credit_card          0.0000000
## educational-debt_consolidation      0.0000000
## home_improvement-debt_consolidation 0.1348022
## major_purchase-debt_consolidation   0.0000000
## small_business-debt_consolidation   0.0000000
## home_improvement-educational        0.0000000
## major_purchase-educational          0.5403655
## small_business-educational          0.0000000
## major_purchase-home_improvement     0.0000000
## small_business-home_improvement     0.0000000
## small_business-major_purchase       0.0000000
aovannual=aov(log.annual.inc ~ purpose, data = loans)
aovannualsummary=summary(aovannual)
aovannualsummary
##               Df Sum Sq Mean Sq F value Pr(>F)    
## purpose        6    163  27.224   75.38 <2e-16 ***
## Residuals   9571   3457   0.361                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aovannualturkey=TukeyHSD(aovannual)
aovannualturkey
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = log.annual.inc ~ purpose, data = loans)
## 
## $purpose
##                                             diff         lwr         upr
## credit_card-all_other                0.201917013  0.13998034  0.26385368
## debt_consolidation-all_other         0.067595668  0.02132325  0.11386808
## educational-all_other               -0.295357330 -0.39784758 -0.19286708
## home_improvement-all_other           0.356665308  0.27703664  0.43629398
## major_purchase-all_other            -0.000417996 -0.09280082  0.09196483
## small_business-all_other             0.300902905  0.22076931  0.38103650
## debt_consolidation-credit_card      -0.134321345 -0.19161427 -0.07702842
## educational-credit_card             -0.497274342 -0.60518910 -0.38935958
## home_improvement-credit_card         0.154748296  0.06824935  0.24124724
## major_purchase-credit_card          -0.202335009 -0.30070131 -0.10396870
## small_business-credit_card           0.098985892  0.01202190  0.18594988
## educational-debt_consolidation      -0.362952998 -0.46270559 -0.26320040
## home_improvement-debt_consolidation  0.289069640  0.21299696  0.36514232
## major_purchase-debt_consolidation   -0.068013664 -0.15734963  0.02132230
## small_business-debt_consolidation    0.233307237  0.15670619  0.30990829
## home_improvement-educational         0.652022638  0.53306815  0.77097712
## major_purchase-educational           0.294939334  0.16709556  0.42278311
## small_business-educational           0.596260234  0.47696716  0.71555330
## major_purchase-home_improvement     -0.357083304 -0.46744862 -0.24671798
## small_business-home_improvement     -0.055762404 -0.15609839  0.04457358
## small_business-major_purchase        0.301320901  0.19059073  0.41205107
##                                         p adj
## credit_card-all_other               0.0000000
## debt_consolidation-all_other        0.0003342
## educational-all_other               0.0000000
## home_improvement-all_other          0.0000000
## major_purchase-all_other            1.0000000
## small_business-all_other            0.0000000
## debt_consolidation-credit_card      0.0000000
## educational-credit_card             0.0000000
## home_improvement-credit_card        0.0000028
## major_purchase-credit_card          0.0000000
## small_business-credit_card          0.0139438
## educational-debt_consolidation      0.0000000
## home_improvement-debt_consolidation 0.0000000
## major_purchase-debt_consolidation   0.2714642
## small_business-debt_consolidation   0.0000000
## home_improvement-educational        0.0000000
## major_purchase-educational          0.0000000
## small_business-educational          0.0000000
## major_purchase-home_improvement     0.0000000
## small_business-home_improvement     0.6569306
## small_business-major_purchase       0.0000000
aovdti=aov(dti ~ purpose, data = loans)
aovdtisummary=summary(aovdti)
aovdtisummary
##               Df Sum Sq Mean Sq F value Pr(>F)    
## purpose        6  25645    4274   95.54 <2e-16 ***
## Residuals   9571 428200      45                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aovdtiturkey=TukeyHSD(aovdti)
aovdtiturkey
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dti ~ purpose, data = loans)
## 
## $purpose
##                                            diff        lwr          upr
## credit_card-all_other                3.01989971  2.3305501  3.709249320
## debt_consolidation-all_other         2.99696390  2.4819561  3.511971740
## educational-all_other                0.26542904 -0.8752783  1.406136404
## home_improvement-all_other          -0.88199409 -1.7682541  0.004265877
## major_purchase-all_other            -0.91961249 -1.9478251  0.108600126
## small_business-all_other            -0.28620243 -1.1780821  0.605677282
## debt_consolidation-credit_card      -0.02293582 -0.6606010  0.614729338
## educational-credit_card             -2.75447067 -3.9555523 -1.553389051
## home_improvement-credit_card        -3.90189381 -4.8646194 -2.939168245
## major_purchase-credit_card          -3.93951220 -5.0343204 -2.844704021
## small_business-credit_card          -3.30610214 -4.2740036 -2.338200706
## educational-debt_consolidation      -2.73153485 -3.8417723 -1.621297380
## home_improvement-debt_consolidation -3.87895799 -4.7256402 -3.032275818
## major_purchase-debt_consolidation   -3.91657638 -4.9108777 -2.922275050
## small_business-debt_consolidation   -3.28316633 -4.1357292 -2.430603493
## home_improvement-educational        -1.14742314 -2.4713759  0.176529618
## major_purchase-educational          -1.18504153 -2.6079313  0.237848255
## small_business-educational          -0.55163148 -1.8793527  0.776089726
## major_purchase-home_improvement     -0.03761839 -1.2659745  1.190737746
## small_business-home_improvement      0.59579166 -0.5209389  1.712522180
## small_business-major_purchase        0.63341005 -0.5990069  1.865826982
##                                         p adj
## credit_card-all_other               0.0000000
## debt_consolidation-all_other        0.0000000
## educational-all_other               0.9933845
## home_improvement-all_other          0.0520775
## major_purchase-all_other            0.1149573
## small_business-all_other            0.9649306
## debt_consolidation-credit_card      0.9999999
## educational-credit_card             0.0000000
## home_improvement-credit_card        0.0000000
## major_purchase-credit_card          0.0000000
## small_business-credit_card          0.0000000
## educational-debt_consolidation      0.0000000
## home_improvement-debt_consolidation 0.0000000
## major_purchase-debt_consolidation   0.0000000
## small_business-debt_consolidation   0.0000000
## home_improvement-educational        0.1399662
## major_purchase-educational          0.1757487
## small_business-educational          0.8845620
## major_purchase-home_improvement     1.0000000
## small_business-home_improvement     0.6995489
## small_business-major_purchase       0.7355365
aovfico=aov(fico ~ purpose, data = loans)
aovficosummary=summary(aovfico)
aovficosummary
##               Df   Sum Sq Mean Sq F value Pr(>F)    
## purpose        6   477491   79582   57.14 <2e-16 ***
## Residuals   9571 13330261    1393                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aovficoturkey=TukeyHSD(aovfico)
aovficoturkey
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = fico ~ purpose, data = loans)
## 
## $purpose
##                                           diff         lwr        upr     p adj
## credit_card-all_other                -5.717275  -9.5635101 -1.8710409 0.0002382
## debt_consolidation-all_other        -11.472691 -14.3461837 -8.5991986 0.0000000
## educational-all_other                -7.061260 -13.4258502 -0.6966688 0.0184988
## home_improvement-all_other            9.461983   4.5170846 14.4068814 0.0000004
## major_purchase-all_other              7.159374   1.4224493 12.8962990 0.0043919
## small_business-all_other              4.644633  -0.3316207  9.6208869 0.0858251
## debt_consolidation-credit_card       -5.755416  -9.3132762 -2.1975551 0.0000384
## educational-credit_card              -1.343984  -8.0454337  5.3574656 0.9970747
## home_improvement-credit_card         15.179258   9.8077194 20.5507975 0.0000000
## major_purchase-credit_card           12.876650   6.7681539 18.9851453 0.0000000
## small_business-credit_card           10.361909   4.9614906 15.7623265 0.0000003
## educational-debt_consolidation        4.411432  -1.7831520 10.6060152 0.3525562
## home_improvement-debt_consolidation  20.934674  16.2106006 25.6587477 0.0000000
## major_purchase-debt_consolidation    18.632065  13.0843488 24.1797818 0.0000000
## small_business-debt_consolidation    16.117324  11.3604394 20.8742090 0.0000000
## home_improvement-educational         16.523243   9.1362318 23.9102532 0.0000000
## major_purchase-educational           14.220634   6.2816026 22.1596647 0.0000027
## small_business-educational           11.705893   4.2978559 19.1139294 0.0000657
## major_purchase-home_improvement      -2.302609  -9.1562370  4.5510193 0.9561484
## small_business-home_improvement      -4.817350 -11.0481615  1.4134617 0.2537562
## small_business-major_purchase        -2.514741  -9.3910264  4.3615443 0.9345421
aovcrline=aov(days.with.cr.line ~ purpose, data = loans)
aovcrlinesummary=summary(aovcrline)
aovcrlinesummary
##               Df    Sum Sq   Mean Sq F value Pr(>F)    
## purpose        6 7.136e+08 118941264    19.3 <2e-16 ***
## Residuals   9571 5.900e+10   6164006                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aovcrlineturkey=TukeyHSD(aovcrline)
aovcrlineturkey
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = days.with.cr.line ~ purpose, data = loans)
## 
## $purpose
##                                           diff         lwr        upr     p adj
## credit_card-all_other                545.29979   289.42551  801.17407 0.0000000
## debt_consolidation-all_other         221.33099    30.16926  412.49272 0.0114478
## educational-all_other               -303.11043  -726.52066  120.29980 0.3460038
## home_improvement-all_other           890.28941   561.32551 1219.25331 0.0000000
## major_purchase-all_other              14.26296  -367.39123  395.91714 0.9999998
## small_business-all_other             580.40963   249.35978  911.45947 0.0000050
## debt_consolidation-credit_card      -323.96880  -560.65874  -87.27887 0.0010731
## educational-credit_card             -848.41022 -1294.23030 -402.59014 0.0000004
## home_improvement-credit_card         344.98962   -12.35694  702.33618 0.0665993
## major_purchase-credit_card          -531.03684  -937.41011 -124.66356 0.0022503
## small_business-credit_card            35.10984  -324.15792  394.37759 0.9999536
## educational-debt_consolidation      -524.44141  -936.54177 -112.34106 0.0033319
## home_improvement-debt_consolidation  668.95842   354.68510  983.23175 0.0000000
## major_purchase-debt_consolidation   -207.06803  -576.13496  161.99889 0.6465642
## small_business-debt_consolidation    359.07864    42.62252  675.53476 0.0144457
## home_improvement-educational        1193.39984   701.97218 1684.82750 0.0000000
## major_purchase-educational           317.37338  -210.77793  845.52470 0.5671043
## small_business-educational           883.52005   390.69362 1376.34649 0.0000026
## major_purchase-home_improvement     -876.02645 -1331.97035 -420.08256 0.0000003
## small_business-home_improvement     -309.87978  -724.39024  104.63067 0.2929493
## small_business-major_purchase        566.14667   108.69548 1023.59786 0.0049222
aovrbal=aov(revol.bal ~ purpose, data = loans)
aovrbalsummary=summary(aovrbal)
aovrbalsummary
##               Df    Sum Sq   Mean Sq F value Pr(>F)    
## purpose        6 2.114e+11 3.524e+10   31.52 <2e-16 ***
## Residuals   9571 1.070e+13 1.118e+09                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aovrbalturkey=TukeyHSD(aovrbal)
aovrbalturkey
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = revol.bal ~ purpose, data = loans)
## 
## $purpose
##                                            diff          lwr         upr
## credit_card-all_other                10296.9807   6850.81493  13743.1465
## debt_consolidation-all_other          4263.6707   1689.06653   6838.2750
## educational-all_other                -2054.1419  -7756.71522   3648.4313
## home_improvement-all_other            4445.7169     15.16559   8876.2681
## major_purchase-all_other             -5601.5868 -10741.78137   -461.3922
## small_business-all_other             14698.1637  10239.51843  19156.8089
## debt_consolidation-credit_card       -6033.3100  -9221.09712  -2845.5228
## educational-credit_card             -12351.1226 -18355.51622  -6346.7291
## home_improvement-credit_card         -5851.2638 -10664.07848  -1038.4492
## major_purchase-credit_card          -15898.5675 -21371.68365 -10425.4514
## small_business-credit_card            4401.1830   -437.50668   9239.8726
## educational-debt_consolidation       -6317.8127 -11868.06227   -767.5631
## home_improvement-debt_consolidation    182.0461  -4050.64959   4414.7418
## major_purchase-debt_consolidation    -9865.2576 -14835.92436  -4894.5907
## small_business-debt_consolidation    10434.4929   6172.39887  14696.5870
## home_improvement-educational          6499.8588   -118.78670  13118.5043
## major_purchase-educational           -3547.4449 -10660.69193   3565.8022
## small_business-educational           16752.3056  10114.82106  23389.7901
## major_purchase-home_improvement     -10047.3037 -16188.04681  -3906.5605
## small_business-home_improvement      10252.4468   4669.73747  15835.1561
## small_business-major_purchase        20299.7505  14138.70680  26460.7941
##                                         p adj
## credit_card-all_other               0.0000000
## debt_consolidation-all_other        0.0000218
## educational-all_other               0.9389848
## home_improvement-all_other          0.0485656
## major_purchase-all_other            0.0223352
## small_business-all_other            0.0000000
## debt_consolidation-credit_card      0.0000005
## educational-credit_card             0.0000000
## home_improvement-credit_card        0.0062412
## major_purchase-credit_card          0.0000000
## small_business-credit_card          0.1027922
## educational-debt_consolidation      0.0139363
## home_improvement-debt_consolidation 0.9999996
## major_purchase-debt_consolidation   0.0000001
## small_business-debt_consolidation   0.0000000
## home_improvement-educational        0.0581190
## major_purchase-educational          0.7623958
## small_business-educational          0.0000000
## major_purchase-home_improvement     0.0000293
## small_business-home_improvement     0.0000013
## small_business-major_purchase       0.0000000
aovrutil=aov(revol.util ~ purpose, data = loans)
aovrutilsummary=summary(aovrutil)
aovrutilsummary
##               Df  Sum Sq Mean Sq F value Pr(>F)    
## purpose        6  626354  104392   134.4 <2e-16 ***
## Residuals   9571 7435913     777                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aovrutilturkey=TukeyHSD(aovrutil)
aovrutilturkey
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = revol.util ~ purpose, data = loans)
## 
## $purpose
##                                            diff        lwr         upr
## credit_card-all_other                13.8881545  11.015499  16.7608104
## debt_consolidation-all_other         14.4131833  12.267044  16.5593226
## educational-all_other                -0.9111547  -5.664707   3.8423980
## home_improvement-all_other           -5.4376945  -9.130915  -1.7444743
## major_purchase-all_other             -7.2544262 -11.539191  -2.9696613
## small_business-all_other              0.3581153  -3.358524   4.0747541
## debt_consolidation-credit_card        0.5250287  -2.132248   3.1823053
## educational-credit_card             -14.7993093 -19.804453  -9.7941651
## home_improvement-credit_card        -19.3258490 -23.337716 -15.3139816
## major_purchase-credit_card          -21.1425807 -25.704862 -16.5802989
## small_business-credit_card          -13.5300392 -17.563476  -9.4966029
## educational-debt_consolidation      -15.3243380 -19.950917 -10.6977593
## home_improvement-debt_consolidation -19.8508778 -23.379170 -16.3225861
## major_purchase-debt_consolidation   -21.6676094 -25.811059 -17.5241595
## small_business-debt_consolidation   -14.0550680 -17.607866 -10.5022704
## home_improvement-educational         -4.5265398 -10.043712   0.9906327
## major_purchase-educational           -6.3432714 -12.272734  -0.4138089
## small_business-educational            1.2692700  -4.263606   6.8021463
## major_purchase-home_improvement      -1.8167317  -6.935534   3.3020708
## small_business-home_improvement       5.7958098   1.142173  10.4494463
## small_business-major_purchase         7.6125415   2.476817  12.7482660
##                                         p adj
## credit_card-all_other               0.0000000
## debt_consolidation-all_other        0.0000000
## educational-all_other               0.9977274
## home_improvement-all_other          0.0002872
## major_purchase-all_other            0.0000125
## small_business-all_other            0.9999573
## debt_consolidation-credit_card      0.9973079
## educational-credit_card             0.0000000
## home_improvement-credit_card        0.0000000
## major_purchase-credit_card          0.0000000
## small_business-credit_card          0.0000000
## educational-debt_consolidation      0.0000000
## home_improvement-debt_consolidation 0.0000000
## major_purchase-debt_consolidation   0.0000000
## small_business-debt_consolidation   0.0000000
## home_improvement-educational        0.1903721
## major_purchase-educational          0.0269245
## small_business-educational          0.9938750
## major_purchase-home_improvement     0.9430705
## small_business-home_improvement     0.0045156
## small_business-major_purchase       0.0002518
aov6mts=aov(inq.last.6mths ~ purpose, data = loans)
aov6mtssummary=summary(aov6mts)
aov6mtssummary
##               Df Sum Sq Mean Sq F value   Pr(>F)    
## purpose        6    298   49.68   10.32 1.99e-11 ***
## Residuals   9571  46065    4.81                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aov6mtsturkey=TukeyHSD(aov6mts)
aov6mtsturkey
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = inq.last.6mths ~ purpose, data = loans)
## 
## $purpose
##                                             diff          lwr         upr
## credit_card-all_other               -0.259023456 -0.485124080 -0.03292283
## debt_consolidation-all_other        -0.185042944 -0.353960999 -0.01612489
## educational-all_other                0.207723759 -0.166418248  0.58186577
## home_improvement-all_other           0.294672824  0.003987328  0.58535832
## major_purchase-all_other            -0.083574585 -0.420819301  0.25367013
## small_business-all_other             0.287260489 -0.005268232  0.57978921
## debt_consolidation-credit_card       0.073980512 -0.135168064  0.28312909
## educational-credit_card              0.466747215  0.072802982  0.86069145
## home_improvement-credit_card         0.553696280  0.237930742  0.86946182
## major_purchase-credit_card           0.175448872 -0.183638606  0.53453635
## small_business-credit_card           0.546283946  0.228820765  0.86374713
## educational-debt_consolidation       0.392766703  0.028618552  0.75691485
## home_improvement-debt_consolidation  0.479715768  0.202011443  0.75742009
## major_purchase-debt_consolidation    0.101468359 -0.224653755  0.42759047
## small_business-debt_consolidation    0.472303433  0.192670303  0.75193656
## home_improvement-educational         0.086949065 -0.347295824  0.52119396
## major_purchase-educational          -0.291298343 -0.757993710  0.17539702
## small_business-educational           0.079536730 -0.355944176  0.51501764
## major_purchase-home_improvement     -0.378247409 -0.781137446  0.02464263
## small_business-home_improvement     -0.007412335 -0.373690148  0.35886548
## small_business-major_purchase        0.370835074 -0.033386867  0.77505701
##                                         p adj
## credit_card-all_other               0.0129519
## debt_consolidation-all_other        0.0211591
## educational-all_other               0.6580146
## home_improvement-all_other          0.0444600
## major_purchase-all_other            0.9907182
## small_business-all_other            0.0581489
## debt_consolidation-credit_card      0.9439626
## educational-credit_card             0.0086671
## home_improvement-credit_card        0.0000049
## major_purchase-credit_card          0.7795481
## small_business-credit_card          0.0000082
## educational-debt_consolidation      0.0248091
## home_improvement-debt_consolidation 0.0000074
## major_purchase-debt_consolidation   0.9699015
## small_business-debt_consolidation   0.0000133
## home_improvement-educational        0.9971007
## major_purchase-educational          0.5203523
## small_business-educational          0.9982673
## major_purchase-home_improvement     0.0822557
## small_business-home_improvement     1.0000000
## small_business-major_purchase       0.0969396
aov2yrs=aov(delinq.2yrs ~ purpose, data = loans)
aov2yrssummary=summary(aov2yrs)
aov2yrssummary
##               Df Sum Sq Mean Sq F value Pr(>F)
## purpose        6    1.4  0.2261   0.758  0.603
## Residuals   9571 2855.9  0.2984
aov2yrsturkey=TukeyHSD(aov2yrs)
aov2yrsturkey
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = delinq.2yrs ~ purpose, data = loans)
## 
## $purpose
##                                             diff         lwr        upr
## credit_card-all_other               -0.028404112 -0.08470194 0.02789372
## debt_consolidation-all_other        -0.016496189 -0.05855587 0.02556349
## educational-all_other               -0.022316777 -0.11547610 0.07084255
## home_improvement-all_other          -0.043026219 -0.11540533 0.02935289
## major_purchase-all_other            -0.005838136 -0.08981024 0.07813397
## small_business-all_other            -0.024662327 -0.09750039 0.04817574
## debt_consolidation-credit_card       0.011907923 -0.04016894 0.06398478
## educational-credit_card              0.006087334 -0.09200264 0.10417731
## home_improvement-credit_card        -0.014622108 -0.09324601 0.06400180
## major_purchase-credit_card           0.022565975 -0.06684486 0.11197681
## small_business-credit_card           0.003741785 -0.07530482 0.08278839
## educational-debt_consolidation      -0.005820589 -0.09649150 0.08485032
## home_improvement-debt_consolidation -0.026530031 -0.09567690 0.04261684
## major_purchase-debt_consolidation    0.010658052 -0.07054458 0.09186069
## small_business-debt_consolidation   -0.008166138 -0.07779327 0.06146099
## home_improvement-educational        -0.020709442 -0.12883406 0.08741518
## major_purchase-educational           0.016478641 -0.09972597 0.13268325
## small_business-educational          -0.002345549 -0.11077793 0.10608683
## major_purchase-home_improvement      0.037188083 -0.06312935 0.13750551
## small_business-home_improvement      0.018363893 -0.07283729 0.10956508
## small_business-major_purchase       -0.018824190 -0.11947326 0.08182488
##                                         p adj
## credit_card-all_other               0.7522736
## debt_consolidation-all_other        0.9101492
## educational-all_other               0.9922602
## home_improvement-all_other          0.5800892
## major_purchase-all_other            0.9999938
## small_business-all_other            0.9544771
## debt_consolidation-credit_card      0.9939822
## educational-credit_card             0.9999969
## home_improvement-credit_card        0.9980816
## major_purchase-credit_card          0.9897696
## small_business-credit_card          0.9999994
## educational-debt_consolidation      0.9999962
## home_improvement-debt_consolidation 0.9185513
## major_purchase-debt_consolidation   0.9997392
## small_business-debt_consolidation   0.9998646
## home_improvement-educational        0.9977371
## major_purchase-educational          0.9995917
## small_business-educational          1.0000000
## major_purchase-home_improvement     0.9303271
## small_business-home_improvement     0.9970089
## small_business-major_purchase       0.9980198
aovpubrec=aov(pub.rec ~ purpose, data = loans)
aovpubrecsummary=summary(aovpubrec)
aovpubrecsummary
##               Df Sum Sq Mean Sq F value Pr(>F)  
## purpose        6    1.1 0.18353   2.674 0.0136 *
## Residuals   9571  656.9 0.06864                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aovpubrecturkey=TukeyHSD(aovpubrec)
aovpubrecturkey
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = pub.rec ~ purpose, data = loans)
## 
## $purpose
##                                              diff          lwr        upr
## credit_card-all_other                2.405972e-02 -0.002941177 0.05106061
## debt_consolidation-all_other         2.245991e-02  0.002287750 0.04263207
## educational-all_other               -4.316270e-03 -0.048996238 0.04036370
## home_improvement-all_other           1.872461e-02 -0.015989000 0.05343821
## major_purchase-all_other             6.871860e-06 -0.040266829 0.04028057
## small_business-all_other             8.494763e-03 -0.026438962 0.04342849
## debt_consolidation-credit_card      -1.599805e-03 -0.026576289 0.02337668
## educational-credit_card             -2.837599e-02 -0.075420734 0.01866876
## home_improvement-credit_card        -5.335110e-03 -0.043043772 0.03237355
## major_purchase-credit_card          -2.405285e-02 -0.066935005 0.01882931
## small_business-credit_card          -1.556495e-02 -0.053476348 0.02234644
## educational-debt_consolidation      -2.677618e-02 -0.070262686 0.01671032
## home_improvement-debt_consolidation -3.735306e-03 -0.036898704 0.02942809
## major_purchase-debt_consolidation   -2.245304e-02 -0.061398482 0.01649240
## small_business-debt_consolidation   -1.396515e-02 -0.047358886 0.01942859
## home_improvement-educational         2.304088e-02 -0.028816567 0.07489832
## major_purchase-educational           4.323141e-03 -0.051409532 0.06005581
## small_business-educational           1.281103e-02 -0.039194016 0.06481608
## major_purchase-home_improvement     -1.871774e-02 -0.066830788 0.02939532
## small_business-home_improvement     -1.022984e-02 -0.053970672 0.03351098
## small_business-major_purchase        8.487891e-03 -0.039784217 0.05676000
##                                         p adj
## credit_card-all_other               0.1177474
## debt_consolidation-all_other        0.0178038
## educational-all_other               0.9999567
## home_improvement-all_other          0.6884216
## major_purchase-all_other            1.0000000
## small_business-all_other            0.9916123
## debt_consolidation-credit_card      0.9999962
## educational-credit_card             0.5625641
## home_improvement-credit_card        0.9995971
## major_purchase-credit_card          0.6468635
## small_business-credit_card          0.8902906
## educational-debt_consolidation      0.5372859
## home_improvement-debt_consolidation 0.9998932
## major_purchase-debt_consolidation   0.6159793
## small_business-debt_consolidation   0.8813089
## home_improvement-educational        0.8474378
## major_purchase-educational          0.9999882
## small_business-educational          0.9910093
## major_purchase-home_improvement     0.9133346
## small_business-home_improvement     0.9932000
## small_business-major_purchase       0.9986018

3.3 Chi Squared Test

The Chi-square test of Independence determines whether there is an association between two categorical variables i.e. whether the variables are independent or related.

chi-square test for purpose vs credit policy

test = chisq.test(table(loans$purpose,loans$credit.policy))
test
## 
##  Pearson's Chi-squared test
## 
## data:  table(loans$purpose, loans$credit.policy)
## X-squared = 21.958, df = 6, p-value = 0.001232
test$observed
##                     
##                      FALSE TRUE
##   all_other            496 1835
##   credit_card          242 1020
##   debt_consolidation   734 3223
##   educational           89  254
##   home_improvement     117  512
##   major_purchase        66  371
##   small_business       124  495
test$expected
##                     
##                          FALSE      TRUE
##   all_other          454.61558 1876.3844
##   credit_card        246.12821 1015.8718
##   debt_consolidation 771.73481 3185.2652
##   educational         66.89539  276.1046
##   home_improvement   122.67404  506.3260
##   major_purchase      85.22823  351.7718
##   small_business     120.72374  498.2763
test$residuals
##                     
##                           FALSE       TRUE
##   all_other           1.9409518 -0.9553797
##   credit_card        -0.2631365  0.1295217
##   debt_consolidation -1.3583388  0.6686046
##   educational         2.7026193 -1.3302894
##   home_improvement   -0.5122906  0.2521608
##   major_purchase     -2.0828002  1.0252006
##   small_business      0.2981822 -0.1467719
corrplot(test$residuals, is.cor = FALSE)

Since the p-value between “purpose” and “credit policy” is less than our chosen significance level of (α = 0.05), we can reject the null hypothesis. We can conclude that there is enough evidence to suggest an association between “purpose” and “credit policy”.

chi-square test for purpose vs not fully paid

test = chisq.test(table(loans$purpose,loans$not.fully.paid))
test
## 
##  Pearson's Chi-squared test
## 
## data:  table(loans$purpose, loans$not.fully.paid)
## X-squared = 96.985, df = 6, p-value < 2.2e-16
test$observed
##                     
##                      FALSE TRUE
##   all_other           1944  387
##   credit_card         1116  146
##   debt_consolidation  3354  603
##   educational          274   69
##   home_improvement     522  107
##   major_purchase       388   49
##   small_business       447  172
test$expected
##                     
##                          FALSE      TRUE
##   all_other          1957.9134 373.08655
##   credit_card        1060.0115 201.98852
##   debt_consolidation 3323.6652 633.33483
##   educational         288.1014  54.89862
##   home_improvement    528.3259 100.67415
##   major_purchase      367.0563  69.94373
##   small_business      519.9264  99.07361
test$residuals
##                     
##                           FALSE       TRUE
##   all_other          -0.3144402  0.7203274
##   credit_card         1.7196643 -3.9394502
##   debt_consolidation  0.5261783 -1.2053825
##   educational        -0.8307855  1.9031843
##   home_improvement   -0.2752124  0.6304635
##   major_purchase      1.0931697 -2.5042608
##   small_business     -3.1982603  7.3266552
corrplot(test$residuals, is.cor = FALSE)

Since the p-value between “purpose” and “not fully paid” is less than our chosen significance level of (α = 0.05), we can reject the null hypothesis. We can conclude that there is enough evidence to suggest an association between “purpose” and “not fully paid”.

chi-square test for credit policy vs not fully paid

test = chisq.test(table(loans$credit.policy,loans$not.fully.paid))
test
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table(loans$credit.policy, loans$not.fully.paid)
## X-squared = 238.38, df = 1, p-value < 2.2e-16
test$observed
##        
##         FALSE TRUE
##   FALSE  1349  519
##   TRUE   6696 1014
test$expected
##        
##            FALSE      TRUE
##   FALSE 1569.019  298.9814
##   TRUE  6475.981 1234.0186
test$residuals
##        
##             FALSE      TRUE
##   FALSE -5.554504 12.724399
##   TRUE   2.734051 -6.263232
corrplot(test$residuals, is.cor = FALSE)

Since the p-value between “credit policy” and “not fully paid” is less than our chosen significance level of (α = 0.05), we can reject the null hypothesis. We can conclude that there is enough evidence to suggest an association between “credit policy” and “not fully paid.

3.4 Q-Q Plots for normality test:

We want to create a Q-Q plot for each numeric variable so we can perform a normality test for each.Please, note that the easiest way to interpret the findings is how closely the data resembles the black reference line representing the normal distribution for the variable.

# By gathering the variables we want to see into a long format with the gather() function, we can then create a Q-Q plot
# for each variable using the facet_wrap() function in ggplot2.
loans %>%
  select(int.rate, installment, log.annual.inc, dti, fico, days.with.cr.line, revol.bal, revol.util,
           inq.last.6mths, delinq.2yrs, pub.rec) %>%
  gather(variable, value) %>%
  ggplot(aes(sample = value)) +
  geom_qq(color = "steelblue") +
  geom_qq_line() +
  facet_wrap(~ variable, scales = "free") + # Free scales so the graphs are readable
  labs(title = "Q-Q Plots of Numeric Variables", x = "Theoretical", y = "Sample") +
  theme_minimal()

4 CONCLUSION AND NEXT STEPS:

Risks to our analysis and opportunities for future analyses:

Private individuals historically made up the bulk of lenders in P2P markets. However, high interest rates and the prospects of risky borrowers undermined P2P lending as a legitimate financial industry. Combined with the urge for more growth by intermediaries like LendingClub, these concerns began to prompt higher lending standards and discussions about more regulation.

By 2017, shortly after the peak of the P2P industry, larger institutions and banks began to take over private individuals as the primary sources of lending in P2P markets. We suspect/assume this shift in P2P lenders altered the makeup of who receives what, thereby rendering recent research on P2P loans as an investment opportunity less reliable as a guide for today’s prospective individual investors.

5 FOLLOW-UP :

5.1 Revolving Balance

While the annual income data was given to us as a log, the revolving balance was given to use unmodified. We discovered that taking the log of of the revol.bal variable gives a better result that looks more normal, but encountered an issue. There are some loans with revol.bal value of 0, and when you take the log of that you get -Inf. We will need to decide how to handle this in the future. For now, we want to demonstrate the results of taking the log of revol.baland how it increases the readability of the data and the variable resebles a normal distribution when is always a good characteristics for further modelling.

loans$revol.bal=log(loans$revol.bal)
loans %>%
  select(int.rate, installment, log.annual.inc, dti, fico, days.with.cr.line, revol.bal, revol.util,
           inq.last.6mths, delinq.2yrs, pub.rec) %>%
  gather(variable, value) %>%
  ggplot(aes(x = value)) +
  geom_histogram(fill = "steelblue", color = "black") +
  facet_wrap(~ variable, scales = "free") + # Free scales so the graphs are readable
  labs(title = "Histograms of Numeric Variables", x = "Value", y = "Count") +
  theme_minimal()

loans %>%
  select(int.rate, installment, log.annual.inc, dti, fico, days.with.cr.line, revol.bal, revol.util,
           inq.last.6mths, delinq.2yrs, pub.rec) %>%
  gather(variable, value) %>%
  ggplot(aes(x = value)) +
  geom_boxplot(fill = "steelblue", color = "black",
               outlier.size = 2, outlier.alpha = 0.2) + # Translucent and larger outliers to help with overplotting
  facet_wrap(~ variable, scales = "free") + # Free scales so the graphs are readable
  labs(title = "Boxplots of Numeric Variables", x = "Value") +
  theme_minimal() +
  theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())

# By gathering the variables we want to see into a long format with the gather() function, we can then create a boxplot
# for each variable using the facet_wrap() function in ggplot2. We can see this for each credit policy value by excluding
# it in the gather() function.
loans %>%
  select(int.rate, installment, log.annual.inc, dti, fico, days.with.cr.line, revol.bal, revol.util,
           inq.last.6mths, delinq.2yrs, pub.rec, credit.policy) %>%
  gather(variable, value, -credit.policy) %>%
  ggplot(aes(x = value, y = as.logical(credit.policy), fill = as.logical(credit.policy))) +
  geom_boxplot(outlier.size = 2, outlier.alpha = 0.2) +  # Translucent and larger outliers to help with overplotting
  guides(fill = guide_legend(reverse = TRUE)) + # So the legend order matches the order in the graphs
  facet_wrap(~ variable, scales = "free_x") + # Free x scale so the graphs are readable
  labs(title = "Boxplots of Numeric Variables", subtitle = "Comparing `credit.policy` Values",
       x = "Value", y = "Count", fill = "Credit Policy") +
  theme_minimal()

# By gathering the variables we want to see into a long format with the gather() function, we can then create a boxplot
# for each variable using the facet_wrap() function in ggplot2. We can see this for each not fully paid value by excluding
# it in the gather() function.
loans %>%
  select(int.rate, installment, log.annual.inc, dti, fico, days.with.cr.line, revol.bal, revol.util,
           inq.last.6mths, delinq.2yrs, pub.rec, not.fully.paid) %>%
  gather(variable, value, -not.fully.paid) %>%
  ggplot(aes(x = value, y = as.logical(not.fully.paid), fill = as.logical(not.fully.paid))) +
  geom_boxplot(outlier.size = 2, outlier.alpha = 0.2) +  # Translucent and larger outliers to help with overplotting
  guides(fill = guide_legend(reverse = TRUE)) + # So the legend order matches the order in the graphs
  facet_wrap(~ variable, scales = "free_x") + # Free x scale so the graphs are readable
  labs(title = "Boxplots of Numeric Variables", subtitle = "Comparing `not.fully.paid` Values",
       x = "Value", y = "Count", fill = "Not Fully Paid") +
  theme_minimal()

# By gathering the variables we want to see into a long format with the gather() function, we can then create a boxplot
# for each variable using the facet_wrap() function in ggplot2. We can see this for each purpose value by excluding
# it in the gather() function.
loans %>%
  select(int.rate, installment, log.annual.inc, dti, fico, days.with.cr.line, revol.bal, revol.util,
           inq.last.6mths, delinq.2yrs, pub.rec, purpose) %>%
  gather(variable, value, -purpose) %>%
  ggplot(aes(x = value, y = purpose, fill = purpose)) +
  geom_boxplot(outlier.size = 2, outlier.alpha = 0.2) +
  guides(fill = guide_legend(reverse = TRUE)) + # So the legend order matches the order in the graphs
  facet_wrap(~ variable, scales = "free_x") + # Free x scale so the graphs are readable
  labs(title = "Boxplots of Numeric Variables", subtitle = "Comparing `purpose` Values",
       x = "Value", y = "Count", fill = "Purpose") +
  theme_minimal()

loans %>%
  select(int.rate, installment, log.annual.inc, dti, fico, days.with.cr.line, revol.bal, revol.util,
           inq.last.6mths, delinq.2yrs, pub.rec) %>%
  gather(variable, value) %>%
  ggplot(aes(sample = value)) +
  geom_qq(color = "steelblue") +
  geom_qq_line() +
  facet_wrap(~ variable, scales = "free") + # Free scales so the graphs are readable
  labs(title = "Q-Q Plots of Numeric Variables", x = "Theoretical", y = "Sample") +
  theme_minimal()

6 ANNEXURE:

We performed z-interval tests for each variable, but decided that t-interval was more appropriate. We are keeping the code here so that we don’t lose the work.Please note that in this case z test is not appropriate because we do not have an idea about the standard deviation of the population. However, in order to better understand the working algorithm of the statistical tests especially the z-score or the z-statistics and how it compares to the t-statistics.

6.1 Z-tests:

loans$revol.bal=exp(loans$revol.bal)
# This code will perform the z-interval tests we want, but  we will show the results in a nicer looking table format
# For the purpose of these z-interval tests we are assuming that the data is normal and therefore has a standard deviation of 2.31
loadPkg("BSDA")
ztest95rate = z.test(x=loans$int.rate, sigma.x = sd(loans$int.rate)) # default conf.level = 0.95
ztest99rate = z.test(x=loans$int.rate, sigma.x = 2.31, conf.level=0.99 )
ztest50rate = z.test(x=loans$int.rate, sigma.x = 2.31, conf.level=0.50 )

tab <- map_df(list(ztest95rate, ztest99rate, ztest50rate), tidy)
tab
## # A tibble: 3 × 7
##   estimate statistic     p.value conf.low conf.high method            alternat…¹
##      <dbl>     <dbl>       <dbl>    <dbl>     <dbl> <chr>             <chr>     
## 1    0.123    447.   0             0.122      0.123 One-sample z-Test two.sided 
## 2    0.123      5.20 0.000000204   0.0618     0.183 One-sample z-Test two.sided 
## 3    0.123      5.20 0.000000204   0.107      0.139 One-sample z-Test two.sided 
## # … with abbreviated variable name ¹​alternative
png("z1.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ztest95installment = z.test(x=loans$installment, sigma.x = 2.31) # default conf.level = 0.95
ztest99installment = z.test(x=loans$installment, sigma.x = 2.31, conf.level=0.99 )
ztest50installment = z.test(x=loans$installment, sigma.x = 2.31, conf.level=0.50 )

tab <- map_df(list(ztest95installment,ztest99installment,ztest50installment), tidy)
tab
## # A tibble: 3 × 7
##   estimate statistic p.value conf.low conf.high method            alternative
##      <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr>             <chr>      
## 1     319.    13519.       0     319.      319. One-sample z-Test two.sided  
## 2     319.    13519.       0     319.      319. One-sample z-Test two.sided  
## 3     319.    13519.       0     319.      319. One-sample z-Test two.sided
png("z2.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ztest95annual = z.test(x=loans$log.annual.inc, sigma.x = 2.31) # default conf.level = 0.95
ztest99annual = z.test(x=loans$log.annual.inc, sigma.x = 2.31, conf.level=0.99 )
ztest50annual = z.test(x=loans$log.annual.inc, sigma.x = 2.31, conf.level=0.50 )

tab <- map_df(list(ztest95annual,ztest99annual,ztest50annual), tidy)
tab
## # A tibble: 3 × 7
##   estimate statistic p.value conf.low conf.high method            alternative
##      <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr>             <chr>      
## 1     10.9      463.       0     10.9      11.0 One-sample z-Test two.sided  
## 2     10.9      463.       0     10.9      11.0 One-sample z-Test two.sided  
## 3     10.9      463.       0     10.9      10.9 One-sample z-Test two.sided
png("z3.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ztest95fico = z.test(x=loans$fico, sigma.x = 2.31) # default conf.level = 0.95
ztest99fico = z.test(x=loans$fico, sigma.x = 2.31, conf.level=0.99 )
ztest50fico = z.test(x=loans$fico, sigma.x = 2.31, conf.level=0.50 )

tab <- map_df(list(ztest95fico,ztest99fico,ztest50fico), tidy)
tab
## # A tibble: 3 × 7
##   estimate statistic p.value conf.low conf.high method            alternative
##      <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr>             <chr>      
## 1     711.    30116.       0     711.      711. One-sample z-Test two.sided  
## 2     711.    30116.       0     711.      711. One-sample z-Test two.sided  
## 3     711.    30116.       0     711.      711. One-sample z-Test two.sided
png("z4.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ztest95dti = z.test(x=loans$dti, sigma.x = 2.31) # default conf.level = 0.95
ztest99dti = z.test(x=loans$dti, sigma.x = 2.31, conf.level=0.99 )
ztest50dti = z.test(x=loans$dti, sigma.x = 2.31, conf.level=0.50 )

tab <- map_df(list(ztest95dti,ztest99dti,ztest50dti), tidy)
tab
## # A tibble: 3 × 7
##   estimate statistic p.value conf.low conf.high method            alternative
##      <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr>             <chr>      
## 1     12.6      534.       0     12.6      12.7 One-sample z-Test two.sided  
## 2     12.6      534.       0     12.5      12.7 One-sample z-Test two.sided  
## 3     12.6      534.       0     12.6      12.6 One-sample z-Test two.sided
png("z5.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ztest95days.with.cr.line = z.test(x=loans$days.with.cr.line, sigma.x = 2.31) # default conf.level = 0.95
ztest99days.with.cr.line = z.test(x=loans$days.with.cr.line, sigma.x = 2.31, conf.level=0.99 )
ztest50days.with.cr.line = z.test(x=loans$days.with.cr.line, sigma.x = 2.31, conf.level=0.50 )

tab <- map_df(list(ztest95days.with.cr.line,ztest99days.with.cr.line,ztest50days.with.cr.line), tidy)
tab
## # A tibble: 3 × 7
##   estimate statistic p.value conf.low conf.high method            alternative
##      <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr>             <chr>      
## 1    4561.   193225.       0    4561.     4561. One-sample z-Test two.sided  
## 2    4561.   193225.       0    4561.     4561. One-sample z-Test two.sided  
## 3    4561.   193225.       0    4561.     4561. One-sample z-Test two.sided
png("z6.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ztest95revol.bal = z.test(x=loans$revol.bal, sigma.x = 2.31) # default conf.level = 0.95
ztest99revol.bal = z.test(x=loans$revol.bal, sigma.x = 2.31, conf.level=0.99 )
ztest50revol.bal = z.test(x=loans$revol.bal, sigma.x = 2.31, conf.level=0.50 )

tab <- map_df(list(ztest95revol.bal,ztest99revol.bal,ztest50revol.bal), tidy)
tab
## # A tibble: 3 × 7
##   estimate statistic p.value conf.low conf.high method            alternative
##      <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr>             <chr>      
## 1   16914.   716590.       0   16914.    16914. One-sample z-Test two.sided  
## 2   16914.   716590.       0   16914.    16914. One-sample z-Test two.sided  
## 3   16914.   716590.       0   16914.    16914. One-sample z-Test two.sided
png("z7.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ztest95revol.util = z.test(x=loans$revol.util, sigma.x = 2.31) # default conf.level = 0.95
ztest99revol.util = z.test(x=loans$revol.util, sigma.x = 2.31, conf.level=0.99 )
ztest50revol.util = z.test(x=loans$revol.util, sigma.x = 2.31, conf.level=0.50 )

tab <- map_df(list(ztest95revol.util,ztest99revol.util,ztest50revol.util), tidy)
tab
## # A tibble: 3 × 7
##   estimate statistic p.value conf.low conf.high method            alternative
##      <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr>             <chr>      
## 1     46.8     1983.       0     46.8      46.8 One-sample z-Test two.sided  
## 2     46.8     1983.       0     46.7      46.9 One-sample z-Test two.sided  
## 3     46.8     1983.       0     46.8      46.8 One-sample z-Test two.sided
png("z8.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ztest95inq.last.6mths = z.test(x=loans$inq.last.6mths, sigma.x = 2.31) # default conf.level = 0.95
ztest99inq.last.6mths = z.test(x=loans$inq.last.6mths, sigma.x = 2.31, conf.level=0.99 )
ztest50inq.last.6mths = z.test(x=loans$inq.last.6mths, sigma.x = 2.31, conf.level=0.50 )

tab <- map_df(list(ztest95inq.last.6mths,ztest99inq.last.6mths,ztest50inq.last.6mths), tidy)
tab
## # A tibble: 3 × 7
##   estimate statistic p.value conf.low conf.high method            alternative
##      <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr>             <chr>      
## 1     1.58      66.8       0     1.53      1.62 One-sample z-Test two.sided  
## 2     1.58      66.8       0     1.52      1.64 One-sample z-Test two.sided  
## 3     1.58      66.8       0     1.56      1.59 One-sample z-Test two.sided
png("z9.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ztest95delinq.2yrs = z.test(x=loans$delinq.2yrs, sigma.x = 2.31)# default conf.level = 0.95
ztest99delinq.2yrs = z.test(x=loans$delinq.2yrs, sigma.x = 2.31, conf.level=0.99 )
ztest50delinq.2yrs = z.test(x=loans$delinq.2yrs, sigma.x = 2.31, conf.level=0.50 )

tab <- map_df(list(ztest95delinq.2yrs,ztest99delinq.2yrs,ztest50delinq.2yrs), tidy)
tab
## # A tibble: 3 × 7
##   estimate statistic  p.value conf.low conf.high method            alternative
##      <dbl>     <dbl>    <dbl>    <dbl>     <dbl> <chr>             <chr>      
## 1    0.164      6.94 4.04e-12    0.117     0.210 One-sample z-Test two.sided  
## 2    0.164      6.94 4.04e-12    0.103     0.225 One-sample z-Test two.sided  
## 3    0.164      6.94 4.04e-12    0.148     0.180 One-sample z-Test two.sided
png("z10.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2
ztest95pub.rec = z.test(x=loans$pub.rec, sigma.x = 2.31) # default conf.level = 0.95
ztest99pub.rec = z.test(x=loans$pub.rec, sigma.x = 2.31, conf.level=0.99 )
ztest50pub.rec = z.test(x=loans$pub.rec, sigma.x = 2.31, conf.level=0.50 )

tab <- map_df(list(ztest95pub.rec,ztest99pub.rec,ztest50pub.rec), tidy)
tab
## # A tibble: 3 × 7
##   estimate statistic p.value conf.low conf.high method            alternative
##      <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr>             <chr>      
## 1   0.0621      2.63 0.00849  0.0159     0.108  One-sample z-Test two.sided  
## 2   0.0621      2.63 0.00849  0.00132    0.123  One-sample z-Test two.sided  
## 3   0.0621      2.63 0.00849  0.0462     0.0780 One-sample z-Test two.sided
png("z11.png", height=100, width=700)
p<-tableGrob(tab)
grid.arrange(p)
dev.off()
## quartz_off_screen 
##                 2